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Figure 5. 



70 

f" From: Used [Userl ©aol.com] 

81 J Sent: Monday, January 22, 2001 8:33 PM 
S To: User3@aol.com 

L Subject: FW: Original Message < — 83 

82 *\_ This message is a forwarded email message. 

Original Message m — 84 

{From: User2 [mailto: User2@aol.com] 
Sent: Monday, January 22, 2001 8:31 PM 
To: Userl 
Subject: RE: Original Message m — 79 

78 *C This message is a reply email message. 

r Original Message < — 80 

I From: Userl [mailto: User1@aol.com] 
75 / Sent: Monday, January 22, 2001 8:30 PM 
| To: User2@aol.com 
I Subject: Original Message < — 76 



74 < This message is an original email message. 



Start 




100 



Load all source message 
stores 



^101 



Create Shadow Store 



^102 



Determine number of passes 
n required to process source 103 
message stores 



For i = 1 to n, do 104 




Process Messages 



^105 



End Do 




106 



Close all message stores 107 



Optionally reinsert duplicate 
and near duplicate messages 



^108 



End 



^Create ShadowY 120 



Set message counter to zero ^121 



For each source message 
store, do 




122 



Create folder corresponding to 
each source message store in 123 
the Shadow Store 



For each folder in current 
source message store, do 




124 



Increment message counter by 
number of messages in folder 
being examined in current 
source message store 



^125 



Create corresponding folder in 
Shadow Store 



^126 



Create entry in keyed collection ^127 



End Do /* folder*/ 




End Do /* source message 
store*/ 



(m< 




128 



129 



Return 



essage coun 




130 



Figur 8. 



^Process Messages^* 140 



For each message in 
selected folder, do 



142 





141 



Extract topic, store ID 
information, folder entry ID and 
message entry ID (metadata) y\s 143 
into Master Array 



End Do /* selected 
folder */ 




144 



Sort messages by topic ^ 1 45 



Process Master Array 



^146 



Process log 



^147 




Figure 9. 



160 




Process 
Master Array 



161 




For each message, do 




Compare topics for adjacent 
messages 



Mark first message 
as beginning of 
topic range 




163 



167 

_2_ 



Extract message as 
unique message 



Extract each topically identical 
168 s\J[ message and transmission time into 

topic array 



169 



Sort topic array by plain 
text body 



170 



Process topic array 



171 




End Do 



Return 



Figure 10A. 



180 




Process 
Topic Array 



181 




For each message in 
topic array, do 



182 



Compare plain text body of 
current message to plain text 
body of next message 




183 




N 



Verify exact duplicate by 
184 s\J( comparing sender, header 

information and transmission time 



185 




N 



Mark first message as exact 
186 s\Ji duplicate and save ID information 
on first and second messages 



187 




End Do 




188 




Eliminate duplicate messages 
from topic array 



189 ^ 



For each message in 
topic array, do 



190^ 



Search for thread markers in 
message 



192 ^ 




N 



193 

_2_ 



Record zero thread 
markers 



Record number of thread marker 
occurrences m 



194^ 



End Do 



195^ 



Sort topic array in order of 
increasing thread markers m 



196 




For each message in 
topic array, do 



B 



Figure 10C. 




1 97 s\j Select message 



198 Select next message 



199 *\j Com P are P lain text 
body of messages 



201 



Mark first message as near 
duplicate and save ID 
information on first and 
second messages 




L 



Figure 11. 



220 Process Log 



221 




For each message in 
master array, do 



222 



223^ 



224^ 



226 




225 



Skip message as near 
duplicate or duplicate 
message 



Retrieve copy of message from 
source message store and place in 
corresponding message store in 
Shadow Store 





Create log entry 
and ID informatio 


ndicating source 
n on message(s) 




* 

r 


End 


Do 1 



Return 
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Figure 16. 



310 



Start 



31 1 Obtain message stores 



312^ 



Extract Messages 



313^ 



Figure 17A. 



De-Dup Matches 



End 



320 



Extract Messages 



I 



For each message store, 
do 




321 



For each message, do Vx^ 322 




Extract message from archived 
message store 



-^323 



Digest extracted message into 
hash code 




^324 



Figure 17B. 




Parse metadata and message 
properties and store into 

database with hash code as file 
record indexed by unique 
identifier 



^325 




For each attachment, do 




327 





r 


Digest attachment into hash 
code 




r 



^328 



/ 



Next /* attachment 7 /\s 329 



Concatenate message hash 
code and each attachment hash 
code into compound hash code 330 
and store into database as 

compound document record 



Next /* message 7 




331 



Next /* message store */ 332 



Return 



Figure 18A. 



De-Dup Matches 



340 




341 \Sy For each message, do 



342 ^ 



Retrieve file record from 
database 



343 




Yes 

Compound? > ► 



345 



Get compound 
hash code 



344 ^| Get message hash code 

i — 

346 v/\ Next /* message */ 




347 v./^ Group messages by hash codes 



348 s Sj/ For each group, do 



349^ 



Mark randomly selected 
message in group as unique 



350 



Mark remaining messages in 
group as exact duplicates 



351 



Next /* group 7 



Figure 18B. 



352 ^ 



Group messages by 
conversation thread 



353 ^ Sort messages by body length 




354 \/y For each thread, do 



355 s Sj/ For each message, do 



356 




For each shorter 
message, do 



357 ^ 



Compare message bodies 



358 



359 



360^- 




Compare attachment 
hash codes 



© ® 



